# From your file menu (at the top of R-Studio) select:
# "Session -> Set working directory -> To source file location"
# Then play this chunk to get the data into R.
library(mosaic)
library(car)
library(DT)
library(pander)
library(readr)
library(plotly)
food <- read_csv("../../Data/food.csv") #food.csv is in the Data folder...
food$Grade_Level <- case_when(food$grade_level==1 ~ 'Freshmen',
                             food$grade_level==2 ~ 'Sophomores',
                             food$grade_level==3 ~ 'Juniors',
                             food$grade_level==4 ~ 'Seniors')

Comments to Critiquers:


Background

I want to see whether students give a rating closer to agree or strongly agree for the “I feel very healthy!” statement when they have more years in college. My personal assumption is that the more time you spend in college, you become a stronger critical thinker and reflect more on what it is that you do within the aspects of your life. I also think that you eat out less because you are more concerned with saving money for when you finish school. Hence, I believe that when you are closer to leaving college, the average rating will be closer to an agree or strongly agree rating for the “I feel very healthy!” statement.

Introduction

I want to conduct a Kruskal-Wallis test to see if the junior and seniors see themselves as more healthy than freshman and sophomores. As previously mentioned, I will look at ratings for the statement: “I feel very healthy!” A 1 is strongly agree and a 10 is strongly disagree. I am assuming students in their later years of college will have a rating closer to 1 (strongly agree).

Questions and Hypotheses

The question I am trying to answer is:

Is the average rating for the statement “I feel very healthy!” different for at least one college classification group (freshmen, sophomores, juniors, seniors)?

\[ H_0: \text{Mean ratings for all college groups are the same} \]

\[ H_a: \text{At least one mean rating for one college group is different than the others} \]

Significance level will be: α=0.05

Data Analysis

student.aov <- aov(healthy_feeling ~ Grade_Level, data=food)
#summary(student.aov) %>% 
#pander()
plot(student.aov, which=1, pch=16)

As you can see, my plot for variance is constant. This means that it satisfies one of the requirements to perform an ANOVA test.

qqPlot(student.aov$residuals, id=FALSE, main='QQPlot')

Some data points in the QQPlot are outside of the boundary lines, which means the QQPlot doesn’t show normality of the residuals. This means I have to perform a Kruskal-Wallis test.

kruskal.test(healthy_feeling ~ Grade_Level, data = food) %>% pander()
Kruskal-Wallis rank sum test: healthy_feeling by Grade_Level
Test statistic df P value
2.276 3 0.5172

The Kruskal-Wallis test shows the P-value is 0.5172. Therefore, we have insufficient evidence to reject the null hypothesis. There is no significant difference in the average healthy feeling rating for at least one group (freshmen, sophomores, juniors, and seniors groups).

Graphical Summary

plot_ly(food, y=~healthy_feeling, x=~Grade_Level, type='box', fillcolor='red', line=list(color='black', width=3), marker = list(color='orange', line = list(color='red', width=1))) %>%
layout(title='Healthy Feeling Rating by College Classification \n (Freshmen, Sophomores, Juniors, Seniors)', xaxis=list(title='College Classification'), yaxis=list(title='Healthy Feeling Rating'))
#xyplot(healthy_feeling ~ as.factor(GradeLevel), data=food, main='Healthy Feeling Rating by College Classification \n (Freshman, Sophomores, Juniors, Seniors)', xlab='College Classification', ylab='Healthy Feeling', type=c("p","a"), col='red')

The boxplot shows the minimum, Q1, median, Q3, and Max healthy feeling rating for the freshmen, sophomores, juniors, and seniors. As we can see, the median healthy feeling rating is very similar for all groups. For freshmen and juniors, the median healthy feeling rating is 6, for sophomores and seniors it is 5.

Numerical Summary

Summary and Sample Size for Freshmen , Sophomores, Juniors, and Seniors

food %>% 
   group_by(Grade_Level) %>% 
   summarise(Mean_Rating = mean(healthy_feeling),
             Median_Rating = median(healthy_feeling),
             sd_Rating = sd(healthy_feeling), 
             Sample_Size = n()) %>%
  pander()
Grade_Level Mean_Rating Median_Rating sd_Rating Sample_Size
Freshmen 5.351 6 2.751 37
Juniors 6.071 6 2.403 28
Seniors 5.393 5 2.393 28
Sophomores 5.094 5 2.728 32

My numerical summary shows the median, mean, and standard deviation for the healthy feeling rating for each group (freshmen, sophomores, juniors, and seniors). It also shows the sample size for each group. The standard deviation for each group’s healthy feeling rating is very close to the same. The average healthy feeling rating for each group is very close as well. For the average healthy feeling rating most groups (freshmen, sophomores, and seniors) were at about 5. Only juniors had an average healthy feeling rating at 6 (specifically 6.071). The median healthy feeling rating for freshmen and juniors was 6 and 5 for seniors and sophomores. The median for each grade level group is close to the same as their averages.

Interpretation

From the data, we can conclude that the average healthy feeling rating by freshmen, sophomores, juniors, and seniors is about the same. We failed to reject the null hypothesis and our data shows how close the healthy feeling ratings are for each group. There was no statistically significant mean difference from at least one of the groups. Therefore, the assumption I made at the beginning of the test was proven false.